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Abstract 

Multihomed services can load-balance their incoming 
connection requests using DNS, resolving the name of 
the server with different addresses depending on the link 
load that corresponds to each address. Previous work 
has studied a number of problems with this approach, 
e.g., due to Time-to-Live duration violations and client 
proximity to local DNS servers. In this paper, we exper- 
imentally evaluate a DNS-based ingress traffic engineer- 
ing system that we deployed at Georgia Tech. Our ob- 
jective is to understand whether simple and robust load 
balancing algorithms can be accurate in practice, despite 
aforementioned problems with DNS-based load balanc- 
ing methods. In particular, we examine the impact of var- 
ious system parameters and of the main workload char- 
acteristics. We show that a window-based measurement 
scheme can be fairly accurate in practice, as long as its 
window duration has been appropriately configured. 

1 Introduction 

As cloud services and content delivery become increas- 
ingly ubiquitous, multihoming is turning to be an integral 
part of network infrastructure, to distribute load and for 
failover. A recent study J4], for example, found that the 
average AS degree increased by one link per AS over the 
last 12 years. 

Ingress Traffic Engineering (ITE) aims to select an in- 
coming link among a set of possible links for the com- 
munication between a multihomed data center (or server 
farm) and its client population. The primary objective of 
ITE is to load-balance the incoming traffic to (and con- 
sequently, the outgoing traffic from) the data center. A 
secondary objective is to choose a better path (e.g., min- 
imum delay or maximum available bandwidth) for each 
client. Typically, however, content providers are primar- 
ily interested in avoiding congestion in their own access 
links, and so the load-balancing objective is their primary 
concern. 



Networks today employ two common approaches to 
do ITE. The first approach relies on BGP and selective 
prefix advertisements J7][12). This approach can balance 
load at the level of IP address blocks, not client networks, 
it creates BGP churn at the Internet core, and it may be 
subject to BGP route dampening and convergence de- 
lays. The second approach uses the DNS infrastructure 
to dynamically select one of k IP addresses to resolve the 
server's name, where k is the number of incoming links. 
The DNS-based solution is becoming increasingly pop- 
ular, since it can balance the server's load at the granu- 
larity of individual DNS requests, and it does not require 
the content provider to use BGP. 

The most commonly deployed scenario of DNS-based 
ITE is the DNS-NAT architecture shown in Figure Q] 
Consider a multihomed network N that uses IP addresses 
from two ISPs X and Y. It is easy to control the assign- 
ment of outgoing connections (initiated from the data 
center servers) to egress links. To control the egress link 
of traffic in client-initiated connections, however, is more 
challenging. One way to do so it is to rely on NATs and 
DNS, as follows. A server S in N is statically NATed with 
two IP addresses Sx and Sy, from X and Y respectively. 
Clients requesting content from server S first resolve its 
hostname, and then establish a TCP connection to S. Net- 
work N runs an authoritative DNS server for the domain 
name of S, which resolves each incoming DNS request 
from a client Local-DNS (LDNS) server with either Sx 
or Sy- Thus, traffic between clients and the data center 
is routed on a per-LDNS basis through ISPs X or Y. Of 
course the same approach can be followed in the case of 
more than two upstream ISPs. 

The ITE method presents some hard challenges. First, 
the traffic that follows a DNS name resolution can con- 
sist of multiple TCP connections, due to DNS caching 
at the client. Second, multiple client sessions can follow 
the same LDNS request, due to caching at LDNS servers. 
Consequently, at DNS resolution time, we do not know 
the magnitude or duration of the traffic that follows each 
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DNS request. Further, that traffic cannot be partitioned 
between different upstream ISPs. Third, TCP can cause 
significant variations in the throughput of the incoming 
load, making it harder to predict the incoming traffic on 
each interface. Fourth, there can be a significant delay 
between a DNS request (and the associated load balanc- 
ing decision) and the arrival of the actual traffic that cor- 
responds to that DNS request. 

Previous work has studied a number of issues with 
DNS-based ITE approaches. First, short advertised DNS 
Time-To-Live (TTL) durations are not always honored 
by remote DNS servers or clients ifPTl . Second, it is not 
possible to migrate an ongoing TCP connection from one 
link (i.e., server address) to another. Third, clients are not 
always near their LDNS servers, which can affect accu- 
racy of ITE methods that attempt to select the best path 
for each client J8). 

Theoretical studies have shown the effectiveness of 
randomized load balancing iflOl . and issues with stale 
measurements J9)- It is known that prior information 
about incoming jobs can help |3]|6). In ITE, however, 
it is not always feasible to predict the arrivals or size of 
client DNS requests. In addition, empirical studies of 
ITE that compliment our work include |2[5][I]. 

In this paper, we evaluate a DNS-based ingress traffic 
engineering system that we deployed at Georgia Tech. 
Our objective is to understand whether a simple but ro- 
bust load balancing algorithm can be accurate in prac- 
tice, despite all known problems with DNS-based load 
balancing methods. We also examine the impact of the 
measurement window duration on load balancing accu- 
racy. 

The rest of the paper is organized as follows. Sec- 
tion [3] describes our ITE prototype and workload. Sec- 
tion |4] shows experimental results for the impact of key 
system and workload characteristics on load balancing 
accuracy. In Section[3] we analyze the impact of window 
duration on load balancing using a history-based mea- 
surement algorithm. 



2 Workload Characteristics of a Content 
Provider 

In this section, we present measurements from the 
GTLIB content distribution service at Georgia Tech. 
This service is provided by four servers whose hostname, 
|www . gtlib . gatech . edu| is resolved in a round-robin 
fashion by two authoritative DNS servers. We have col- 
lected full Netflow traces from the Georgia Tech campus 
border router. In parallel, we run tcpdump at the author- 
itative DNS servers of the previous site. The measure- 
ments were performed for a duration of 24 hours starting 
on 10th April 2008 at 9pm. We analyzed the collected 
traces (the capture clocks were kept synchronized via 
NTP) to first understand whether remote LDNS servers 
honor the advertised DNS TTL of 8 hours, and second, 
to characterize the traffic workload in terms of bytes per 
client and clients per LDNS. 

How many LDNS servers honor the advertised 
TTL? We first look at the distribution of the mini- 
mum inter-arrivals from remote LDNS servers. The 
A records for |www .gtlib . gatech . eduj pointed to 
|128.61.111. [9- 12] 1 and the advertised TTL was 8 
hours. There were about 46,400 resolutions of type A. 
Figure |2(a)| shows the distribution of minimum inter- 
arrival times for each LDNS server that contacted our 
authoritative servers. The figure shows that for around 
60% of the remote LDNS servers, we received requests 
at most once per 8 hours. We can say that these LDNS 
servers either follow the advertised TTL and/or have a 
lower client resolution request rate than one per 8 hours. 
For the remaining 40% of the LDNS servers, we expect 
that they either use a TTL of less than 8 hours, or that 
they do not do caching. In other words, these LDNS 
servers violated the advertised TTL of 8 hours. More- 
over, about 10% of them have a very short minimum 
inter-arrival period, which implies that they may not be 
doing any caching. 

How many clients correspond to each LDNS 
server? The challenge in this measurement is how to 
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Minimum LDNS request inter-arrival (s) Bytes per client Clients per LDNS 

(a) Request inter-arrivals from same LDNS (b) Bytes per client (c) Clients per LDNS 

Figure 2: Workload characteristics from GTLIB content distribution service. 



associate an LDNS request with all subsequent arriving 
connections from clients that have used that LDNS re- 
quest to resolve the server's name. We use a simple ap- 
proach to do this correlation, based on the hypothesis 
that an LDNS server and its associated clients belong to 
the same Autonomous System, and thus their addresses 
would both have the same BGP AS-Origin attribute (even 
though they often do not belong to the same IP address 
prefix). First, for each client request from an address A, 
we find earlier DNS requests from LDNS servers that be- 
long to the same Autonomous System that advertises ad- 
dress A. If there are no such LDNS servers, or if there are 
multiple such servers, we ignore that client request. Oth- 
erwise, we associate that client request with the most re- 
cent DNS request from the corresponding LDNS server. 
Note that we ignore clients which resolved the GTLIB 
hostname before the start of our 24-hour dataset. We also 
ignore clients that round-robin between DNS servers in 
their resolver configurations. Most OSes either do not 
support this feature or disable it by default. Using the 
previous approach, we found that we can associate 92% 
of the client sessions with an LDNS entry from the same 
origin-AS with the client. In total, we identified 2864 
unique LDNS servers that we could associate with spe- 
cific client requests. Figure [2(c)1 shows the distribution of 
the number of clients using a given LDNS. We also show 
regression curves for Pareto and lognormal distributions. 
Note that the Pareto distribution is a better fit, showing 
that the number of clients that correspond to each LDNS 
is highly skewed. 

How many bytes were received by each client? Fig- 
ure |2(b)| shows the distribution of bytes that the GTLIB 
servers sent/received from each unique client address in 
the course of the 24-hour trace. Note that the lognormal 
distribution is a better fit to the measured data. 

We use the previous GTLIB observations to emulate 



realistic workload in the experimental evaluation of a 
DNS-based ITE system, described in the next section. 

3 System Implementation and Deployment 

In this section, we describe our DNS-ITE prototype. The 
Georgia Tech campus network has several commercial 
and research providers. We multihome our server us- 
ing two IP addresses which are advertised to the Internet 
through Qwest, Cogent, and Internet2. The server's host- 
name is dynamically resolved by our DNS server to one 
of the two addresses. 

Load-balancing Algorithms: We perform load bal- 
ancing using two algorithms: (1) round-robin (RR) and 
(2) measurement-based (MB). The RR scheme selects 
the server's IP address in a round-robin fashion. The MB 
algorithm uses recent history of ingress and egress traf- 
fic at the interfaces of our server to make a load-based 
decision about the next name resolution. Our goal is 
to understand how simple round-robin and history-based 
schemes work in practice. 

Implementation: Our ITE system consists of two 
processes running on the same host, the DNS pro- 
cess and a monitoring process. The DNS process is a 
non-blocking and concurrent, non-recursive authoritative 
nameserver, which serves LDNS resolution requests for 
our domain. The DNS process communicates with the 
monitoring process to get traffic measurements. This 
communication is done using shared memory; we have 
also tested an RPC -based mechanism to run the two pro- 
cesses on different hosts. The monitoring process mea- 
sures aggregate traffic utilization at the two interfaces of 
the server. We measure the load on each interface using a 
sliding window of length W=nw seconds, which consists 
of « small windows of length w. The sliding window 
moves in steps of w seconds. In our implementation, 
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we set w = 100ms. Our prototype can be extended to 
other traffic measurement methods such as Netflow, and 
to multiple content servers. 

Server characteristics: We run an Apache server 
on Linux serving content of client requested size over 
HTTP. The server, DNS and monitoring processes run on 
a 2GHz hyper threading-enabled Xeon with 1GB phys- 
ical memory. At peak experiment loads of 5-10 Mbps 
traffic, the CPU usage does not exceed 20%. 

Workload: We emulate a realistic workload by using 
40 clients on PlanetLab and 6 clients on RON networks. 
Some of our workload parameters are drawn from ob- 
servations of Georgia Tech's GTLIB content distribution 
mirror. We choose these nodes such that they use dif- 
ferent LDNS servers (i.e. have disjoint resolver configu- 
rations). Further, we pick LDNS servers that follow the 
advertised TTL (so that, for example, we can emulate 
LDNS servers using a minimum TTL). The duration of 
each experiment is 10 minutes. 

The traffic model that we emulate is described next. 
First, 40% of LDNS servers do not follow the nominal 
TTL of 15 seconds - we advertise a 15s TTL to those 
servers. For these LDNS servers, we advertise a TTL that 
is uniformly distributed in [5,600]s. We always advertise 
the same TTL to a given LDNS server. The fraction of 
TTL violations is based on DNS logs from GTLIB. Sec- 
ond, clients follow a closed-loop (or interactive) arrival 
model, in which they download a file over TCP, sleep for 
some time, and then repeat this process. Unless stated 
otherwise, the sleep times are exponentially distributed 
with a mean of 35 seconds. Third, clients download log- 
normally distributed file sizes with a mean of 225KB, 
based on a 24-hour Netflow data from GTLIB. However, 
we had to truncate the size distribution to 625KB to avoid 
exceeding PlanetLab byte limits that could trigger rate- 
limiting. Fourth, there can be many clients behind an 
LDNS. We refer to these clients as hidden clients. We 
emulate them by spawning multiple simultaneous pro- 
cesses on the same client host. The number of hidden 
clients on each host is drawn from the uniform distri- 
bution [1,5]. Finally, the emulated clients have diverse 
path characteristics (RTT and available bandwidth) to 
our server, as it would also happen in a real content 
provider. 

4 DNS-ITE Performance 

The accuracy of any load balancing scheme depends on 
the job size granularity at which we can "route" jobs to 
servers. In the context of DNS-ITE, this granularity is 
the the number of bytes that correspond to each LDNS 
request. If each LDNS request was followed by only few 
bytes worth of load, we would be able to achieve much 
more accurate load balancing than if each LDNS request 



was followed by a large and long file transfer. In this sec- 
tion, we start with a model that describes the factors on 
which the DNS-ITE load balancing granularity depends 
on. We then empirically evaluate how the accuracy of a 
round-robin load balancer depends on these parameters. 

Consider « clients behind each LDNS server. Sup- 
pose each client downloads s bytes from the server in 
each connection. Let the arrival rate of connections per 
client be r, and the arrival rate of DNS requests from each 
LDNS be X. If a remote LDNS server uses caching, the 
TTL that it follows is T seconds. Then, 

( nr if LDNS is non-caching 

~ 1 min{nr, i } LDNS caches with TTL T 

The traffic rate that corresponds to an LDNS server is 
given by R = nrs (bps). The granularity in which we can 
balance the arriving traffic is: 

R \s if i is non-caching, or nr < j 
X | nrsT otherwise 

Next, we quantify the effect of the parameters that con- 
trol this ratio R/X on the load balancing accuracy that we 
can achieve with a simple PvR algorithm. 

The error metric: We quantify the load balancing 
accuracy in terms of the relative difference between the 
utilization of the two links, measured in an averaging 
timescale of / seconds. More precisely, we measure the 
traffic utilization (in bps) Un(i) and Ui^(t) of the two 
links at our server in a sliding window of length / that 
starts at time t. Under perfect load balancing conditions, 
the load on each link during the interval (/,/ + /) should 
be [Uj i(t) + Uj2(t)]/2. The load balancing error £ is de- 
fined as: 

p (A = |£//,i(Q-t//,2(0l 
£,[t > UvV + Uvit) 

We re-compute £/(f) every one second. 

Figure [3] shows the impact of the sliding window 
length / on the median error £/(/), for one of our experi- 
ments. As expected, the load balancing error is higher as 
we decrease /, as there are fewer arriving connection re- 
quests in shorter intervals. Also note that the error metric 
tends to stabilize when / is larger than about 15 seconds. 
This shows that the RR load balancing algorithm is not 
able to eliminate the relative error, even when we use a 
significantly long (15s) averaging timescale. In the rest 
of this paper, we use 7=20s. 

Aggregate load: Our goal in this experiment is to ex- 
amine the impact of the aggregate traffic load on the rel- 
ative error. We do so by varying the number of deployed 
LDNS servers (and hence the number of active clients) 
from 10 to 45. Figure shows the distribution of the 
relative error. Note that the error decreases as the aggre- 
gate utilization increases. As we decrease the number of 
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Figure 3: Variation of median error with averaging 
timescale /. 

LDNS servers and the associated clients, the aggregate 
load drops, decreasing the frequency between arriving 
DNS requests. Thus, the load balancer has fewer oppor- 
tunities to distribute the arriving load between the two 
links. 

File size distribution: Here, we first examine the ef- 
fect of the requested file size and second, the effect of the 
significant variability in the lognormal file size distribu- 
tion compared to the case of constant file sizes. In or- 
der to keep the aggregate load fixed, we adjust the mean 
idle period between requests from each client. The top 
part of Figure |4] shows effect of increasing the file size 
on the error distribution, when clients request fixed-sized 
files. Note that as we increase the file size from 30KB to 
625KB, the errors increase. This is expected, because 
with smaller flows, our DNS server can take more fre- 
quent load balancing decisions, amortizing the load be- 
tween the two servers at a finer granularity. 

The bottom part of Figure [4] shows the differences be- 
tween fixed-size transfers and lognormally-sized trans- 
fers. We use a truncation size of 1MB for the latter. 
We see that the errors increase when the content size is 
heavy-tailed. The main reason is that the round-robin 
scheduler does not consider the current load on each link. 
Thus, in a heterogeneous workload with transfers of dif- 
ferent sizes, it can happen that one link receives several 
long transfers while another receives mostly short trans- 
fers, causing periods of significant imbalance. 

Advertised TTL: Next, we illustrate the effect of ad- 
vertised TTL on the load balancing accuracy. EquationQ] 
shows that accuracy depends on the TTL T that is adver- 



Figure 4: Effect of file size (RR). 

tised to LDNS servers, as long as the request rate from 
each LDNS rn is larger than l/T. When the request rate 
is less than that, we expect that the load balancing error 
will not depend on TTL. 

We examined the impact of the advertised TTL on the 
median of the relative error e as follows. The client re- 
quest rate r is set to once per 35s. Hence, the request 
inter-arrival m from an LDNS varies between 7s and 35s, 
depending on the number of clients per LDNS. Figure |6] 
shows the median error and Wilcoxon-based 99% confi- 
dence intervals for different advertised TTLs. Consistent 
with Equation Q] when the time period between succes- 
sive DNS requests from the same LDNS is less than the 
advertised TTL, the error increases with the TTL (TTL 
values 1, 5, and 15 seconds). For the two larger TTL val- 
ues, the load balancing error does not increase with the 
TTL because the client requests arrive too infrequently 
to be affected by DNS caching. 

Hidden clients: We also investigate the effect of the 
number of clients per LDNS. We keep the number of 
clients constant across all LDNS servers. In order to keep 
the aggregate load fixed, we adjust the number of active 
LDNS servers. We also keep the client idle period at 14s 
(mean) so that in the case of a single client per LDNS, 
the client request rate is higher than the advertised TTL 
(15s). 

Figure [7] shows the load balancing error when we in- 
crease the number of clients per LDNS from one to five. 
We see that the case for five clients per LDNS shows a 
clear increase in the load balancing error compared to the 
case of one and three hidden clients. Hidden clients in- 
crease load balancing errors because the amount of traffic 
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Figure 5: Effect of aggregate load (RR). 

that corresponds to each LDNS request grows with their 
count. 

5 Measurement-based DNS-ITE 

In this section, we evaluate the performance of a 
measurement-based load balancing scheme, and exam- 
ine the impact of the measurement history on its accu- 
racy. Intuitively, the worst-case scenario for a round- 
robin scheme is when requests for file sizes arrive in 
{large, small} pairs. A measurement-based scheme can 
alleviate such problems by routing clients based on the 
current link loads. 

In the measurement-based (MB) scheme that we con- 
sider in this paper, we measure the incoming and out- 
going load on each link of our content server using a 
sliding-window of duration W. On a new LDNS request, 
we compare the latest utilization measurements of the 
two links and advertise the interface with the minimum 
load. 

Comparison between RR and MB: We compare 
MB with RR load balancing using the same workload 
model we used in the previous section. Figure [8] shows 
the distribution of load balancing errors in the / = 20s 
timescale with different window sizes, and with the RR 
scheme. We see that a large window of W = 30s per- 
forms worse than the RR scheme, while a small window 
of W = 100ms performs marginally better than RR. A 
window size of W = 10s is significantly better than RR. 
It is clear that the parameter W has a significant impact 
on the accuracy of MB, and so we need to further under- 
stand why. 
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Figure 6: Effect of advertised TTL (RR). 

An important issue in any load balancing scheme that 
relies on historical data is whether those past measure- 
ments are stale, meaning that they no longer reflect the 
current loads [9 |. Intuitively, a larger value of W is more 
prone to such errors than a shorter W. However, the mag- 
nitude of W also controls the variance of measurements. 
A shorter W introduces more noisy measurements, mak- 
ing it harder to accurately estimate the load on each link. 
An appropriate value of W needs to consider carefully 
the staleness-vs-variance trade-off based on the dynam- 
ics and burstiness of the underlying traffic. 

Staleness-vs-variance trade-off: To illustrate the ef- 
fects of measurement staleness and variance on load 
balancing accuracy we start with an experiment using 
two simple, synthetic traffic models. In the first model 
(CBR), each flow has a constant size and duration and the 
packets are transmitted periodically. Here, the only vari- 
ability in the aggregate traffic is due to flow start/finish 
events. In the second model, the traffic is generated from 
an aggregate of Pareto renewal processes (inter-packet 
gaps), causing significant variability in the packet inter- 
arrivals. There are no flow start/finish events, however, 
which means that there are no rapid changes in the arriv- 
ing traffic rate. 

Figure [9] shows the load balancing error distribution 
for the two models, for different Ws. For CBR, a smaller 
value of W (0.1s) performs best. The reason is that in 
CBR the measurement variance is minimal (each flow 
consists of periodic packet arrivals), and even a very 
short measurement window will suffice to estimate the 
load of each link accurately. A larger W, say 10s, is 
detrimental because it is subject to stale measurements 
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Figure 7: Effect of hidden clients (RR). 

(a different number of active flows than currently active). 
With the Pareto model, on the other hand, we get lower 
errors when W=ls compared to W=0.1s. The reason is 
that this traffic is more bursty, and so we need a longer 
time period in order to reliably know which link is more 
heavily loaded. 

There is no "optimal" value of W that is independent 
of the statistical characteristics of the underlying traf- 
fic. The general guideline that we can provide is that 
W should be as short as possible ( to avoid the issue of 
stateness) subject to the constraint that the measurement 
variance is sufficiently low to reliably show which link 
has the minimum load. Note that the objective is not how 
to accurately estimate the load on each link. Instead, we 
are simply interested in identifying the link with the min- 
imum load. 

What is an appropriate value of W for the TCP-based 
workload that we used in our experiments? To answer 
this question, we simulated the MB load balancer on 
packet traces from the W = 10s experiment. Specifically, 
we partitioned the experiment duration into 100ms inter- 
vals, identifying the load from different LDNS servers 
in each interval. We then used a (hypothetical) window 
size W to make MB load balancing decisions for any new 
DNS requests in that interval, and measured the result- 
ing load balancing errors at / = 20s. Figure[l0]shows the 
median error observed with W. We see that a measure- 
ment window W in [5, 15]s gives the lowest errors. With 
smaller window sizes we see the negative effects of vari- 
ance, while with larger windows we observe the negative 
effects of staleness. It is interesting that there is a wide 
range of W in which the load balancing accuracy is al- 
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Figure 8: RR versus MB load balancing. 
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Figure 9: Staleness-vs-variance trade-off. 

most constant, implying that the selection of W may not 
need to be fine-tuned in practice. 

Limitations of MB DNS-ITE: The MB scheme can 
improve load balancing accuracy compared to RR, but 
it can still suffer from an intrinsic problem with DNS- 
based ITE: there can be a significant delay between a 
DNS request (and the associated load balancing deci- 
sion) and the arrival of the actual traffic that corresponds 
to that DNS request. To illustrate this effect, consider the 
following simplified model of DNS-ITE. 

Suppose that we receive DNS requests at a constant 
rate A. Traffic associated with each DNS request can 



7 



0.16 




0.04 1 1 1 ' 

5 10 15 20 

Window size W (s) 



Figure 10: Effect of W on MB accuracy. 

originate from multiple TCP connections and from mul- 
tiple clients. Further, there can be a significant delay be- 
tween a load balancing decision and the arrival of the 
actual traffic load. For a single connection, this delay 
includes the DNS response, the TCP connection estab- 
lishment phase, or the slow-start phase. Let 8 be the 
delay between a DNS request and the time that the as- 
sociated traffic arrives. During that period, we can re- 
ceive 8X other DNS requests, which will be routed with- 
out considering the load that has been already commit- 
ted (but not arrived) to each link. If 8 > 1/A, i.e. if 8 
is significant compared to the inter-arrival time of DNS 
requests, the MB scheme will not be able to correctly 
amortize the load between the available links, at least in 
short timescales. We view this as a fundamental prob- 
lem with the DNS-ITE method, which cannot be avoided 
given that the delay 8 is unknown and it varies across 
LDNS servers and connection requests. 

6 Conclusion 

In this work, we looked at the problem of ingress traf- 
fic load balancing using DNS-based techniques (ITE) in 
multihomed networks. We implemented an ITE load 
balancer for a content server, and designed a wide-area 
client testbed with realistic workload characteristics. Our 
contributions are two fold. 

First, we showed that the accuracy of ITE can be im- 
pacted by factors which include (1) aggregate client load, 
(2) DNS TTL policies in client networks, (3) hidden 
clients, and (4) heavy-tailed content sizes. We found 
that large aggregate load (1) can improve accuracy, while 
TTL violations (2), 3, and 4 can degrade performance. 
These observations can be used to design a content dis- 



tribution service which aids load balancing. 

Second, we showed that measurement-based (MB) 
schemes improve performance over a round-robin 
scheme when the length of measurement history is short 
enough. We evaluated the impact of high variance and 
staleness in measurement history. We finally looked at 
limitations of MB schemes due to inherent nature of the 
ITE problem. 
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